134 research outputs found

    A direct measure of discriminant and characteristic capability for classifier building and assessment

    Get PDF
    AbstractPerformance measures are used in various stages of the process aimed at solving a classification problem. Unfortunately, most of these measures are in fact biased, meaning that they strictly depend on the class ratio – i.e. on the imbalance between negative and positive samples. After pointing to the source of bias for the best known measures, novel unbiased measures are defined which are able to capture the concepts of discriminant and characteristic capability. The combined use of these measures can give important information to researchers involved in machine learning or pattern recognition tasks, in particular for classifier performance assessment and feature selection

    A Parametric Hierarchical Planner for Experimenting Abstraction Techniques

    Get PDF
    This paper presents a parametric system, devised and implemented to perform hierarchical planning by delegating the actual search to an external planner (the "parameter") at any level of abstraction, including the ground one. Aimed at giving a better insight of whether or not the exploitation of abstract spaces can be used for solving complex planning problems, comparisons have been made between instances of the hierarchical planner and their non hierarchical counterparts. To improve the significance of the results, three different planners have been selected and used while performing experiments. To facilitate the setting of experimental environments, a novel semi-automatic technique, used to generate abstraction hierarchies starting from ground-level domain descriptions, is also described

    A Route Confidence Evaluation Method for Reliable Hierarchical Text Categorization

    Full text link
    Hierarchical Text Categorization (HTC) is becoming increasingly important with the rapidly growing amount of text data available in the World Wide Web. Among the different strategies proposed to cope with HTC, the Local Classifier per Node (LCN) approach attains good performance by mirroring the underlying class hierarchy while enforcing a top-down strategy in the testing step. However, the problem of embedding hierarchical information (parent-child relationship) to improve the performance of HTC systems still remains open. A confidence evaluation method for a selected route in the hierarchy is proposed to evaluate the reliability of the final candidate labels in an HTC system. In order to take into account the information embedded in the hierarchy, weight factors are used to take into account the importance of each level. An acceptance/rejection strategy in the top-down decision making process is proposed, which improves the overall categorization accuracy by rejecting a few percentage of samples, i.e., those with low reliability score. Experimental results on the Reuters benchmark dataset (RCV1- v2) confirm the effectiveness of the proposed method, compared to other state-of-the art HTC methods

    PACMAS: A Personalized, Adaptive, and Cooperative MultiAgent System Architecture

    Get PDF
    In this paper, a generic architecture, designed to support the implementation of applications aimed at managing information among different and heterogeneous sources, is presented. Information is filtered and organized according to personal interests explicitly stated by the user. User pro- files are improved and refined throughout time by suitable adaptation techniques. The overall architecture has been called PACMAS, being a support for implementing Personalized, Adaptive, and Cooperative MultiAgent Systems. PACMAS agents are autonomous and flexible, and can be made personal, adaptive and cooperative, depending on the given application. The peculiarities of the architecture are highlighted by illustrating three relevant case studies focused on giving a support to undergraduate and graduate students, on predicting protein secondary structure, and on classifying newspaper articles, respectively

    A text classification framework based on optimized error correcting output code

    Get PDF
    In recent years, there has been increasing interest in using text classifiers for retrieving and filtering infomation from web sources. As the numbers of categories in this kind of software applications can be high, Error correcting Output Coding (ECOC) can be a valid approach to perform multi-class classification. This paper explores the use of ECOC for learning text classifiers using two kinds of dichotomizers and compares them to each corresponding monolithic classifier. We propose a simulated annealing approach to calculate the coding matrix using an energy function similar to the electrostatic potential energy of a system of charges, which allows to maximize the average distance between codewords |with low variance. In addition, we use a new criterion for selecting features, a feature (in this specific context) being any term that may occur in a document. This criterion defines a measure of discriminant capability and allows to order terms according to it. Three different measures have been experimented to perform feature ranking/selection, in a comparative setting. Experimental results show that reducing the set of features used to train classifiers does not affect classification performance. Notably, feature selection is not a preprocessing activity valid for all dichotomizers. In fact, features are selected for each dichotomizer that occurs in the matrix coding, typically giving rise to a different subset of features depending on the dichotomizers at hand

    Devising novel performance measures for assessing the behavior of multilayer perceptrons trained on regression tasks

    Get PDF
    This methodological article is mainly aimed at establishing a bridge between classification and regression tasks, in a frame shaped by performance evaluation. More specifically, a general procedure for calculating performance measures is proposed, which can be applied to both classification and regression models. To this end, a notable change in the policy used to evaluate the confusion matrix is made, with the goal of reporting information about regression performance therein. This policy, called generalized token sharing, allows to a) assess models trained on both classification and regression tasks, b) evaluate the importance of input features, and c) inspect the behavior of multilayer perceptrons by looking at their hidden layers. The occurrence of success and failure patterns at the hidden layers of multilayer perceptrons trained and tested on selected regression problems, together with the effectiveness of layer-wise training, is also discussed

    Analysis of term roles along taxonomy nodes by adopting discriminant and characteristic capabilities

    Get PDF
    Taxonomies are becoming essential to a growing number of application, particularly for specific domains. Taxonomies, originally built by hand, have been recently focused on their automatic generation. In particular, a main issue on automatic taxonomy building regards the choice of the most suitable features. In this paper, we propose an analy- sis on how each feature changes its role along taxonomy nodes in a text categorization scenario, in which the features are the terms in textual documents. We deem that, in a hierarchical structure, each node should intuitively be represented with proper meaningful and discriminant terms (i.e., performing a feature selection task for each node), instead of con- sidering a fixed feature space. To assess the discriminant power of a term, we adopt two novel metrics able to measure it. Our conjecture is that a term could significantly change its discriminant power (hence, its role) along the taxonomy levels. We perform experiments aimed at proving that a significant number of terms play different roles in each taxonomy node, giving emphasis to the usefulness of a distinct feature selection for each node. We assert that this analysis should support automatic taxonomy building approaches

    Automated taxonomy building by adopting discriminant and characteristic capabilities

    Get PDF
    Taxonomies are becoming essential in several fields, playing an important role in a large number of applications, particularly for specific domains. Taxonomies provide efficient tools to people by organizing a huge amount of information into a small hierarchical structure. Taxonomies were originally built by hand, but nowadays the technology permits to produce a vast amount of information. Consequently, recent research activities have been focused on automated taxonomy generation. In this paper, we propose a novel approach for automatically build a taxonomy, starting from a set of categories. We deem that, in a hierarchical structure, each node should intuitively be represented with proper meaningful and discriminant features, instead of considering a fixed feature space. Our proposal relies on two metrics able to identify the most meaningful features. Our conjecture is that a feature could significantly change its discriminant power (hence, its role) along the taxonomy levels. Hence, we devise a greedy algorithm able to build a taxonomy by identifying the meaningful terms for each level. We perform preliminary experiments that give rise to the usefulness of the proposed approach
    • …
    corecore